Computer Vision Project 2ΒΆ

Author: Maja Noack Date: 2025-10-22

part_1ΒΆ

Part 1: Image Homography EstimationΒΆ

Implement homography estimation and image warping to rectify an input image containing a planar surface (e.g., a document, a poster, or a road sign) captured at an angle.

What is Direct Linear Transformation (DLT)?ΒΆ

The Direct Linear Transformation (DLT) algorithm provides a linear method for estimating a projective transformation, such as a homography , from point correspondences. It is widely used in computer vision tasks including camera calibration, image alignment, and 3D reconstruction.

1. Problem FormulationΒΆ

Given a set of corresponding points between two planes:

$$(x_i, y_i) \leftrightarrow (x'_i, y'_i)$$

we wish to determine the transformation matrix $H$ that maps one set to the other according to

$$ \begin{bmatrix} x'_i \\ y'_i \\ 1 \end{bmatrix} \sim H \begin{bmatrix} x_i \\ y_i \\ 1 \end{bmatrix}, $$

where β€œ$\sim$” denotes equality up to a nonzero scale factor due to the use of homogeneous coordinates.

2. Homography RepresentationΒΆ

The homography matrix $H$ is a $3 \times 3$ matrix of the form

$$ H = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix}. $$

For a single correspondence $(x, y) \leftrightarrow (x', y')$, the mapping can be expressed as

$$ x' = \frac{h_{11}x + h_{12}y + h_{13}}{h_{31}x + h_{32}y + h_{33}}, \qquad y' = \frac{h_{21}x + h_{22}y + h_{23}}{h_{31}x + h_{32}y + h_{33}}. $$

Multiplying both sides by the denominators yields two linear equations in the unknowns $h_{ij}$:

$$ x'(h_{31}x + h_{32}y + h_{33}) = h_{11}x + h_{12}y + h_{13}, $$ $$ y'(h_{31}x + h_{32}y + h_{33}) = h_{21}x + h_{22}y + h_{23}. $$

3. Construction of the Linear SystemΒΆ

Each point correspondence contributes two equations that can be written in matrix form as

$$ \begin{bmatrix} 0 & 0 & 0 & -x_i & -y_i & -1 & y'_i x_i & y'_i y_i & y'_i \\ x_i & y_i & 1 & 0 & 0 & 0 & -x'_i x_i & -x'_i y_i & -x'_i \end{bmatrix} \begin{bmatrix} h_{11}\\h_{12}\\h_{13}\\h_{21}\\h_{22}\\h_{23}\\h_{31}\\h_{32}\\h_{33} \end{bmatrix} = 0. $$

Stacking all $n$ correspondences results in a matrix equation

$$A h = 0,$$

where $A$ is a $2n \times 9$ matrix and $h$ is the vector of unknown parameters.

4. Solving the SystemΒΆ

The equation $A h = 0$ is a homogeneous linear system. To find a nontrivial solution, one minimizes $\|A h\|$ subject to $\|h\| = 1$.
This can be solved using Singular Value Decomposition (SVD):

$$A = U \Sigma V^T.$$

The solution $h$ corresponds to the last column of $V$, i.e., the singular vector associated with the smallest singular value of $A$.
The vector $h$ is then reshaped into the $3 \times 3$ matrix $H$.

Coordinate NormalizationΒΆ

To improve numerical stability, the input points are often normalized before constructing $A$:

  1. Translate the points so that their centroid is at the origin.
  2. Scale them so that the average distance from the origin is $\sqrt{2}$.

After computing $H$ in normalized coordinates, the result must be denormalized using the inverse transformations applied to the original coordinates.

First I build an interactive pannel to allow selecting 4+ referencepoints in the original image. The implementation is based on https://www.geeksforgeeks.org/python/displaying-the-coordinates-of-the-points-clicked-on-the-image-using-python-opencv/.

Figure
No description has been provided for this image

Selected points for the Homography are shown in red on the original image

Figure
No description has been provided for this image

Discussion how homography enables rectification:ΒΆ

Homography enables users to remove distortions which occur in perspective images of planar surfaces. The process of taking pictures at non-perpendicular angles causes objects like documents or signs to become distorted through projective geometric transformations. We can use the homography matrix H for projection which becomes calculable through the Direct Linear Transformation (DLT) algorithm when at least four points are given between the distorted image and a target rectangle. Often times this produces non-integer coordinates so the algorithm applies bilinear interpolation to calculate pixel values from surrounding pixels in the original image. The algorithm performs the transformation in reverse order to prevent pixels from overlapping or creating empty spaces when projecting them forward. This creates a fronto-parallel view of the planar surface which has been geometrically corrected to show parallel lines where converging lines used to be and it eliminates all perspective distortions.

part_2ΒΆ

Part 2: Creative Application (Exploration and Extension)ΒΆ

Use your homography implementation creatively to demonstrate a real-world or artistic application.

For this create part I decided to build a box unfolder. It starts from to images of a ceral box, gift, or any other cubic item. The first photo is an angled image showing the front, top and side of the box, the second image is an image of the back, bottom and other side.

No description has been provided for this image

In the next step the user gets asked to select for correspondence points for each side of the box that get safed in a json for later use. Order of the selected points is important to account for the order of the corresponding points in the homography.

No description has been provided for this image

Using the Homography function of Task 1 on each of the sides and assembling them in a standard box pattern yields a 2D version of the Box that can be stored on devices to read instructions later or can be used to reprint and recreate the box. As visible in the final unfolded box output the quality of the image is very important. As the photos where taken from a short distance with a phone camera the corners of the box that where further from the camera are blurred due to the limited depth of field (DoF). Using a proper camera with small apature and photographing from a slightly larger distance to increase the depth of field can enhance the projection drastically. Additionally due to the of the box not being perfectly straight specifically for the front and back. the projection is slightly curved. This can be fixed by straightening the box or adding more reference points.

part_3ΒΆ

Part 3: Warping ComparisonΒΆ

Compare triangular mesh warping and thin-plate spline (TPS) warping.

For Part 3, warping one source image to a target (based on the provided ones from the same digit) using both techniques. For example, warping digit 3-a to match digit 3-b. Also, warping digit 7-a to match digit 7-b, based on the provided images below. No face morphing here, the purpose of Part 3 is to understand the differences between two warping techniques (triangular mesh warping and thin-plate spline warping).

Load images

No description has been provided for this image

To select the landmarks for the digits the select_points function from task 1 was slightly adjusted and landmarks where saved to their corresponding json files.

No description has been provided for this image

Warping Methods where implemented using scipy, numpy and cv2 functions.

No description has been provided for this image
No description has been provided for this image

Find your own images and show the differences of using triangular mesh warping and thin-plate spline (TPS) warping:ΒΆ

I choose images of daylilys and created dense 70 landmarks two allow warping one daylily variety into another. That way new varieties can be created with different petal sizes and colorings.

No description has been provided for this image

Analysis explaining the advantages/disadvantages of each methodΒΆ

Compare visually and discuss differences in smoothness, continuity, and computational cost:

To test the stability of the two approaches many landmark points (39) where selected for warping number 3 and only a few (14) where selected for number 7. The most complex warping is the warping of daylilies with 70 reference points.

The triangular mesh warp technique divides images into small triangles which get independitly affine transformed for warping. The method shows great local control and maintains good edge and detail preservation. It handles each triangle independently computing them in linear time which makes it very efficient even with increasing triangle numbers. Specifically in the daylily example small discontinuities are visible around the petals making them apear flat and segmented. For the MNIST digit images this is beneficial as we want want strong edges seperating the number from the backround. For a large number of landmarks on a simple image like the example of digit 3 the method appears more stable than TPS as it warps more locally and precise.

The TPS warp generates one unified deformation field which uses all landmarks to create a smooth transformation. The TPS warp generates very natural-looking transformations which work best for images with smooth shape changes such as the daylily pictures. The method produces unpredictable background warping when control points are few or concentrated in a single area. It additionally requires more computational resources as it runs in cubic time which leads to longer processing times and reduced scalability. It can easily get unstable as visible in the digit 3 images where the high number of landmark points creates swirls and warps of the plane that make it look unatural and dissimilar from the target image.

The triangular mesh warp method is best for precise local results at high speed for rigid structures. TPS warp on the other hand generates smooth global transformations with higher computational cost. The selection between these methods depends on whether the application.